Deep Interactive Region Segmentation and Captioning
نویسندگان
چکیده
With recent innovations in dense image captioning, it is now possible to describe every object of the scene with a caption while objects are determined by bounding boxes. However, interpretation of such an output is not trivial due to the existence of many overlapping bounding boxes. Furthermore, in current captioning frameworks, the user is not able to involve personal preferences to exclude out of interest areas. In this paper, we propose a novel hybrid deep learning architecture for interactive region segmentation and captioning where the user is able to specify an arbitrary region of the image that should be processed. To this end, a dedicated Fully Convolutional Network (FCN) named Lyncean FCN (LFCN) is trained using our special training data to isolate the User Intention Region (UIR) as the output of an efficient segmentation. In parallel, a dense image captioning model is utilized to provide a wide variety of captions for that region. Then, the UIR will be explained with the caption of the best match bounding box. To the best of our knowledge, this is the first work that provides such a comprehensive output. Our experiments show the superiority of the proposed approach over state-of-the-art interactive segmentation methods on several well-known datasets. In addition, replacement of the bounding boxes with the result of the interactive segmentation leads to a better understanding of the dense image captioning output as well as accuracy enhancement for the object detection in terms of Intersection over Union (IoU).
منابع مشابه
Segmentation of the Left Atrial Appendage in the Echocardiographic Images of the Heart Using a Deep Neural Network
Introduction: Cardiovascular diseases are one of the leading causes of mortality in today’s industrial world. Occlusion of left atrial appendage (LAA) using the manufactured devices is a growing trend. The objective of this study was to develop a computer-aided diagnosis system for the identification of LAA in echocardiographic images. Method: The data used in this descriptive analytical study ...
متن کاملSegmentation of the Left Atrial Appendage in the Echocardiographic Images of the Heart Using a Deep Neural Network
Introduction: Cardiovascular diseases are one of the leading causes of mortality in today’s industrial world. Occlusion of left atrial appendage (LAA) using the manufactured devices is a growing trend. The objective of this study was to develop a computer-aided diagnosis system for the identification of LAA in echocardiographic images. Method: The data used in this descriptive analytical study ...
متن کاملMyocardial fibrosis delineation in late gadolinium enhancement images of Hypertrophic Cardiomyopathy patients using deep learning methods
Introduction: Accurate delineation of myocardial fibrosis in Late Gadolinium Enhancement on Cardiac Magnetic Resonance (LGE-CMR) has a crucial role in the assessment and risk stratification of HCM patients. As this is time-consuming and requires expertise, automation can be essential in accelerating this process. This study aims to use Unet-based deep learning methods to automate the mentioned ...
متن کاملChanges on the Horizon for the Multimedia Community
The Impact of Deep Learning The development of AI algorithms, represented by deep learning, has bolstered multimedia research. In particular, deep learning has led to a multimodality-based algorithm framework, enabling the effective fusion and use of cross-domain data. Take image and video captioning, for example. A couple of years ago, tagging was the only way to describe images and videos. Bu...
متن کاملA multi-scale convolutional neural network for automatic cloud and cloud shadow detection from Gaofen-1 images
The reconstruction of the information contaminated by cloud and cloud shadow is an important step in pre-processing of high-resolution satellite images. The cloud and cloud shadow automatic segmentation could be the first step in the process of reconstructing the information contaminated by cloud and cloud shadow. This stage is a remarkable challenge due to the relatively inefficient performanc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1707.08364 شماره
صفحات -
تاریخ انتشار 2017